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Since the identification of a new coronavirus (severe acute 
respiratory syndrome coronavirus [SARS-CoV]) as the caus¬ 
ative agent of the SARS epidemic in the winter of 2002-2003, 
the origin of the novel agent has remained a hotly debated 
topic. Which virus was the immediate ancestor of SARS-CoV, 
and what are the relationships between SARS-CoV and other 
previously described coronaviruses? Correct answers to these 
two questions are vital, as substantiated below, for designing 
strategies to detect, contain, and combat new outbreaks and 
for dissecting the fundamentals of the SARS-CoV life cycle. 

Major efforts have been invested in a thus far unsuccessful 
search for a natural SARS-CoV reservoir. In the meantime, 
and more outside the spotlight, SARS-CoV genome sequences 
have been used to define the phylogenetic position of SARS- 
CoV among coronaviruses. These studies have resulted in a lot 
of controversy whose intricacies may not be very clear to out¬ 
siders. Our purpose is to clarify the situation from an insider’s 
point of view. 

Originally, coronaviruses were classified on the basis of an¬ 
tigenic cross-reactivity, and in this manner three antigenic 
groups (1 to 3) were recognized (14). When coronavirus ge¬ 
nome sequences began to accumulate, the same groups were 
evident from phylogenetic analyses of the four structural pro¬ 
teins, N, M, E, and S (19), and of different regions of the giant 
replicase (3, 22). Group boundaries were also supported by the 
diversity of small open reading frames (ORFs) encoding ac¬ 
cessory proteins, which are dispersed among the structural 
protein genes in the 3'-proximal region of the genome (Fig. 1). 
In the middle of the nineties, a first discord between the anti¬ 
genicity-based and phylogenetic classifications emerged upon 
the characterization of the coronavirus porcine epidemic diar¬ 
rhea virus (PEDV) and human coronavirus 229E (HCoV- 
229E), one of the common cold viruses. These viruses proved 
not to have antigenic cross-reactivity with members of the 
established groups (18), yet on the basis of sequence compar¬ 
isons it was concluded that they segregate into group 1, al¬ 
though they are somewhat separated from porcine transmissi¬ 
ble gastroenteritis virus and closely related viruses (subgroup 
lb and subgroup la, respectively, in Fig. 2) (9). The PEDV and 
HCoV-229E genomes also share an ORF specific for group 1 
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in the 3'-proximal region of their genome. The Coronavirus 
Study Group of the International Committee on Taxonomy of 
Viruses recognized these viruses as members of group 1 rather 
than declaring them prototypes of new groups (6). This deci¬ 
sion effectively converted the original antigenic groups—which 
were based essentially on some properties of one or a few viral 
proteins—into a genetic one based on full-length genome se¬ 
quences, but this change was never acknowledged explicitly. 
Consequently, no guidelines were established with respect to 
handling future disagreements between the classifications 
based on antigenicity, genome organization, and phylogeny 
should these arise from the properties of newly identified coro¬ 
naviruses, and SARS-CoV proved to be quite a classification 
challenge. 

Initial phylogenetic analyses suggested that the novel virus 
did not cluster with any of the three established coronavirus 
groups. Accordingly, SARS-CoV also has a unique pattern of 
small ORFs in the 3'-proximal region of its genome and a 
unique internal organization of its nonstructural protein 3 
(nsp3) replicase subunit, which includes a sizable novel domain 
(SARS-CoV unique domain SUD) and only one papain-like 
protease (PL2pro) rather than the two copies commonly found 
in other coronaviruses (Fig. 1). Although a thorough assess¬ 
ment of the antigenic cross-reactivity of SARS-CoV with other 
coronaviruses is yet to be published, a proposal to recognize 
SARS-CoV as a representative of a new, fourth group of coro¬ 
naviruses seemed most logical (15, 17). 

If SARS-CoV indeed represents a new group, then when, 
relative to other groups, could this lineage have emerged? 
Several scenarios are theoretically plausible, and one of the 
most extreme ones, which seems compatible with the unique 
characteristics of SARS-CoV, places the origin of this lineage 
next to the ancestor of the other coronaviruses (Fig. 2A). To 
rigorously infer the origin of SARS-CoV, we conducted a spe¬ 
cial analysis of the replicase ORFlb region (Fig. 1), the most- 
conserved part of the coronavirus genome, which accounts for 
~20% of its size (20). In this analysis, the equine torovirus—a 
distant relative of coronaviruses belonging to the genus Toro¬ 
virus of the same Coronaviridae family—was used as an out¬ 
group to infer the direction of coronavirus evolution. 

Surprisingly, our fully resolved tree demonstrated that the 
SARS-CoV lineage is an early split-off from the group 2 branch 
and that the split-off occurred relatively late in coronavirus 
evolution, after the two bifurcations that gave rise to the three 
previously established groups (Fig. 2B). This topology is un¬ 
likely to be skewed, as it was obtained by using different criteria 
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FIG. 1. Overview of the SARS-CoV genome organization and proteins (20). The positions of coronavirus group-specific genetic and structural 
markers are indicated. SARS-CoV has the markers of group 2 coronaviruses. In the N-terminal part of the S protein, only Cys residues are 
distributed in a group-specific fashion (5) (dashed line). SUD, SARS-CoV-unique domain; PL2 pro , group 2 papain-like protease 2; TM2, 
transmembrane domain 2; 3CL pro , chymotrypsin-like protease related to 3C pro of picornaviruses. 


and both protein and nucleotide sequences as well as in an 
independent study (21). The early split off of the (avian) group 
3 coronaviruses from all mammalian coronaviruses also adds 
to the credibility of the tree. Based on these observations, it 




FIG. 2. Two alternative scenarios of SARS-CoV evolution, shown 
on the same coronavirus tree with alternative roots. The SARS-CoV 
branch splits, for the sake of comparison, from an ancestor of the other 
known coronaviruses (A) or, as defined in reference 20, from the group 
2 branch (B). IBV, infectious bronchitis virus; TGEV, porcine trans¬ 
missible gastroenteritis virus; EToV, equine torovirus. BCoV-Lun and 
other BCoV strains are most closely related to HCoV OC43, which 
was not included in the analysis. 


was proposed (20) that the group 2 coronaviruses be split into 
subgroup 2a, which includes the established group 2 coronavi¬ 
ruses like mouse hepatitis virus (MHV) and bovine coronavi¬ 
rus (BCoV), and subgroup 2b, of which SARS-CoV is the 
prototype. 

This classification of SARS-CoV, one of the least expected, 
appeared to be irreconcilable with other data and was fiercely 
debated. Some researchers, unaware of the complex relation¬ 
ship between phylogeny- and antigenicity-based classifications 
of coronaviruses (see above), may have been misguided to 
believe that this tree (Fig. 2B) implies that SARS-CoV shares 
antigenic cross-reactivity with established group 2 viruses. Oth¬ 
ers may have had a hard time reconciling all the unique prop¬ 
erties of SARS-CoV with its group 2 membership. Is the latter 
gap indeed as large as it seems? 

Inspection of the published unrooted trees reveals that in 
the majority, the SARS-CoV branch indeed originates on the 
group 2 side. Other topologies were originally published only 
for the M and E structural proteins, but the M-protein tree was 
later revised in favor of the dominant topology (5). The level of 
protein conservation behind these topologies is marginally 
higher between SARS-CoV and group 2 viruses than between 
group 1 and 3 viruses and, despite its consistent character, may 
not appear very impressive. However, this margin is significant 
on the evolutionary scale, since in addition, distinctive group 
2-specific genetic and/or structural markers were recently iden¬ 
tified for SARS-CoV in the nspl and nsp3 replicase subunits 
(20), the SI portion of the S protein (5), and, at the RNA level, 
the 3' untranslated region (3'UTR) of the genome (8) (Fig. 1). 

The conspicuous genetic differences between SARS-CoV 
and subgroup 2a coronaviruses may not be completely surpris¬ 
ing either. Similar, albeit less profound, differences are also 
evident upon comparison of regions encoding nsp3 and com¬ 
parable genome proteins encoded downstream of ORFlb in 
viruses of subgroups la and lb (6, 20). Furthermore, corona¬ 
viruses tolerate deletions and reshuffling of ORFs in the 3'- 
proximal region of their genome in genetic experiments (4). 
Also, in SARS-CoV the rapid evolution of unique 3'-proximal 
ORFs has already been reported, both in the field (ORF8a and 
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ORF8b) (10) and in Vero cell cultures (ORF7b) (20, 23). 
Thus, there is little in the available data that could not be 
reconciled with the classification of SARS-CoV as a group 2 
coronavirus. 

This and any other hierarchical classification is subject to the 
condition that recombination—a well-known phenomenon in 
coronaviruses (11, 13)—is mostly restricted by group bound¬ 
aries. This notion was recently challenged in two studies claim¬ 
ing that the putative RNA-dependent RNA polymerase 
(RdRp) locus of SARS-CoV (nspl2 in Fig. 1) has resulted 
from recombination of viruses from groups 1 and 4 (16) and 
that the M and N protein genes are of group 3 origin (21). The 
genes identified as being of recombinant origin specify key 
functions of coronavirus replication and virion biogenesis, re¬ 
spectively, and the above results were obtained with advanced 
phylogenetic methods which were never before applied to 
coronaviruses. Taken together, these considerations seem to 
argue for the mosaic origin of these SARS-CoV major genes 
and raise the question of whether it will be possible at all to 
build a meaningful classification for SARS-CoV and other 
coronaviruses. Flowever, we believe that it is premature to 
accept these conclusions. For instance, it remains unresolved 
why the authors of these studies (16, 21) were unable to mu¬ 
tually verily each other’s findings and why others, who analyzed 
the same regions, have missed these recombinations (7). For 
the RdRp results, it is also worth noting that unless very close 
relatives are involved recombinants generated by exchange 
within regions encoding key replicative enzymes of RNA vi¬ 
ruses may have poor viability (for examples, see reference 1), 
emphasizing that claims to the contrary should be validated 
extensively. 

Recombination aside, what is at stake? Does it really matter 
whether SARS-CoV is a distant member of group 2 or the 
prototype of a group of its own? It does. The clustering of 
SARS-CoV within the younger subgroup 2b rather than the 
older group 4 signifies that relatively small virus diversity can 
be anticipated in the SARS-CoV lineage, and consequently, 
the unique genetic properties of SARS-CoV must have 
evolved relatively fast on the scale of coronavirus evolution. It 
further predicts that SARS-CoV may be more prone to further 
gross changes than suggested by its alternative classification in 
the fourth group. It identifies group 2 viruses, including the 
extensively characterized MHV, as the most suitable virus 
models to characterize SARS-CoV-specific genes and/or do¬ 
mains, in particular during dissection of the replicative machin¬ 
ery of the virus and processes involving group-specific deter¬ 
minants. It also indicates that analysis of the close BCoV/ 
HCoV-043 pair from group 2 could provide useful insights for 
understanding the adaptation of animal coronaviruses to hu¬ 
mans in general and the emergence of SARS-CoV in particu¬ 
lar. In summary, decision making in both fundamental and 
applied research is not going to be the same with SARS-CoV 
being recognized as a member of subgroup 2b rather than 
group 4. 

What is next? Part of the original and remaining confusion 
concerning SARS-CoV phylogeny could be attributed to tech¬ 
nical aspects, like alignment quality, choice of viruses and 
genes analyzed, and software used. These issues remain crucial 
if the ongoing phylogenetic analysis of SARS-CoV is to result 
in a broad consensus. Particularly, this future accord should 


accommodate RNA recombination. Although the swapping of 
conserved genes or domains between distant coronaviruses, 
including SARS-CoV, is yet to be independently verified (see 
above), the involvement of recombination in the evolution of 
the SARS-CoV lineage (subgroup 2b) per se is beyond doubt. 
This assertion is based on the numerous data identifying RNA 
recombination as the major mechanism for gross evolution of 
RNA virus genomes (2, 12). Consequently, the most unique 
genetic characteristics of SARS-CoV in the nsp3 gene and in 
the 3'-proximal region of the genome must have evolved by 
recombination involving either SARS-CoV itself, its subgroup 
2b ancestors, or both. In this respect, the identification of the 
parental sequences of these SARS-CoV-specific genes is 
clearly very important, and the sequence analysis of new sub¬ 
group 2b coronavirus genomes may help in this quest. The 
functional characterization of group-specific genetic markers, 
as initiated for the 3'UTR (8), is essential for providing the 
foundations for a biologically appealing definition of new 
groups. Finally, the whole controversy arose in the first place 
because the field had become used to dealing with classifica¬ 
tion matters on a case-by-case basis and had never before 
experienced any urgency in this regard. Now it is high time for 
the coronavirus community to make up its mind with respect to 
the relative classification value of phylogenetic analyses of rep¬ 
licative, structural, and accessory proteins and to formulate the 
principles to build a consistent and reliable coronavirus classi¬ 
fication system. 
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